Different training samples will lead to different individual risks
It decreases with higher sample size
Patients with uncommon covariate pattern have more unstable risks
It is quantified by confidence intervals
Simulations parameters
Training data sample: bootstrap with replacement 100 times.
Training data size:
N = 400
N = 2000
N = 10000
Modeling uncertainty
Lack of knowledge about the optimal model
Definition
Modeling algorithm
Hyperparameter values
Predictors to be included and transformations
All modeler choices
Normally it is not quantified at all
Simulations parameters
Algorithm: Logistic Regression, Random forest and XGB
Handling of continuous predictors (LR):
Linear
Dichotomize at median or categorize in 4 groups
Multivariate fractional polynomials
Restricted cubic splines
Variable selection (LR)
None
Backward elimination \(\alpha = 0.01\) or \(\alpha = 0.20\)
Penalization (LR)
No
Ridge with \(\lambda\) tuned with AIC
Tree based methods
Minimum node size (RF)/ Maximum depth (XGB): 2 or 20
Tuning: Yes or no
Applicability uncertainty
Data collection and population variability
Definition
Uncertainty due to data differences
Different variable definitions
Measurement procedures and error vary
Missing data handling
Uncertainty due to population differences
Case-mix differences in different settings
Population drift in the same setting
Different inclusion and exclusion criteria
Simulations parameters
Training population sample:
Leuven, Belgium
Malmo, Sweden
Rome, Italy
Handling of missing data:
Regression imputation
Conditional median imputation
Missing indicator imputation
Measurement of lesion:
Diameter
Volume
Experiment
Aim: evaluate different uncertainty categories in ovarian cancer prediction
Validation set of n=100 (fixed from the center in Leuven).
Train models varying the different categories of uncertainty.
Calculate the individual risk for each patient for each model.
Illustrate variability in the individual risks.
Preferred modelling:
Logistic regression with restricted cubic splines, no penalization and no variable selection
Missing data imputed with regression
Lesion measured in diameter and data from Leuven
Preferred modelling (1)
Training sample: 400
Training sample: 10000
Approximation uncertainty (100)
Training sample: 400
Training sample: 10000
Modelling uncertainty (33)
Training sample: 400
Training sample: 10000
Modelling and approximation uncertainty (3300)
Training sample: 400
Training sample: 10000
Applicability (18)
Training sample: 400
Training sample: 10000
Applicability and approximation (1800)
Training sample: 400
Training sample: 10000
All sources of uncertainty (594000)
Training sample: 400
Training sample: 10000
Individual risks range on average 39% with n = 10000.
Conclusion
Prediction models that perform well still estimate very uncertain individual risks.
Classic uncertainty measurement (CI) is not enough to quantify uncertainty in individual risks.
Approximation uncertainty is reduced with sample size but total uncertainty is dominated by modeling and applicability uncertainty.
No need to be skeptical about the models, population performance is enough to guarantee using them as a decision strategy.
We should be humble when talking about Personalized Medicine
No individual risks but risk for individual patients.
What can be done about uncertainty?
Approximation: Use enough sample size.
Modeling: Better education and use of best practices (guidelines)
Data: Avoid retrospective studies and standardize measurements and definitions.
Population: Multicenter studies to asses heterogeneity.
Embrace uncertainty
Further readings
Van Calster, B. et al. Performance evaluation of predictive AI models to support medical decisions: Overview and guidance. Preprint at arXiv (2024).
Altman, D. G. & Royston, P. What do we mean by validating a prognostic model? Statist. Med. 19, 453–473 (2000).
Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110, 457–506 (2021).
Gruber, C. et al. Sources of Uncertainty in Machine Learning – A Statisticians’ View. Preprint at arXiv (2023).
Riley, R. D. & Collins, G. S. Stability of clinical prediction models developed using statistical or machine learning methods. Biometrical Journal n/a, 2200302 (2023).
Riley, R. D. et al. Clinical prediction models and the multiverse of madness. BMC Medicine 21, 502 (2023).
Tsegaye, B. et al. Larger sample sizes are needed when developing a clinical prediction model using machine learning in oncology: methodological systematic review. Journal of Clinical Epidemiology 180, (2025).
Pate, A. et al. The uncertainty with using risk prediction models for individual decision making: an exemplar cohort study examining the prediction of cardiovascular disease in English primary care. BMC Medicine 17, 134 (2019).
Stern, R. H. Individual Risk. The Journal of Clinical Hypertension 14, 261–264 (2012).